119-2010: Blistering ETL Performance Using the Intelligent, Dynamic, and Parallel Capabilities of SAS®

نویسنده

  • David Logan
چکیده

ETL from billion-record+ databases is a non-trivial task. By using the unique capabilities of SAS® to analyze the database in advance and then dynamically generating parallel SAS® jobs based on this interrogation, SAS® can effectively performance-tune itself each run for maximum benefit. Get in, get out, get the results. The net effect is to get the fastest possible results with the minimum window of ETL load on the source system. Scalable, efficient, and easy to develop. An added benefit is the data visualization possible post-analysis and pre-ETL to aid data quality checks. Improving your ETL time (in the case study, by 92%), using existing resources, postponing expensive hardware upgrades, dynamically adjusting for ever increasing data volumes and, improving information productivity are powerful arguments for considering the adoption of this approach, where feasible, in your own environment. INTRODUCTION Initially the purpose of this paper was to communicate the huge performance benefits of running your SAS® ETL system with an intelligent, dynamic, parallel approach in a specific case study at a telco client. The approach has since been used with a variety of source databases with similar benefits, so I will explain from a conceptual point of view the principles and then finish with a brief overview of the initial case study. We start with some basic parallel principles and then follow the logic through into the Intelligent, Dynamic and Parallel phases of this paper. Then the case study will be used as an example of applying these principles. Going at length into the exact detail of implementation as per the case study became an increasingly lengthy paper and the main thing to gain from reading this is an understanding of how to apply the principles for your own specific environment, so the intention is to keep it at a high-level conceptual explanation whereby you can possibly see a scenario in your own environment where it can be applies successfully, too. Data Integration SAS Global Forum 2010

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Intelligent Controller for Parallel DC/DC Converters

In this paper, the immune controller, is used to control the paralleled DC-DC converters. A PID controller is first applied and its coefficient is optimized using an intelligent (PSO) algorithm. Immune controller is then added to PID controller and an immune PID controller is formed. Two methods have been suggested to determine non-linear behavior of immune controller. In the first method, an e...

متن کامل

Merging DEL and ETL

This paper surveys the interface between the two major logical trends that describe agents’ intelligent interaction over time: dynamic epistemic logic (DEL) and epistemic temporal logic (ETL). The initial attempt to “merge” DEL and ETL was made in [12] and followed up by [11] and [29]. The merged framework provides a systematic comparison between these two logical systems and studies new logics...

متن کامل

Chemical Reaction Effects on Bio-Convection Nanofluid flow between two Parallel Plates in Rotating System with Variable Viscosity: A Numerical Study

In the present work, a mathematical model is developed and analyzed to study the influence of nanoparticle concentration through Brownian motion and thermophoresis diffusion. The governing system of PDEs is transformed into a coupled non-linear ODEs by using suitable variables. The converted equations are then solved by using robust shooting method with the help of MATLAB (bvp4c). The impacts o...

متن کامل

135-2011: Best Solutions for Tuning Performance of ETL Jobs in SAS® Data Integration Studio

SAS® Data Integration Studio is a great tool for building and maintaining data warehouses and data marts. The performance of the extract, transform, and load (ETL) job is critical for building data warehouses and data marts. This paper discusses the time-consuming data transformations related to ETL processes in SAS Data Integration Studio. The performance for each data transformation is benchm...

متن کامل

A hybrid CS-SA intelligent approach to solve uncertain dynamic facility layout problems considering dependency of demands

This paper aims at proposing a quadratic assignment-based mathematical model to deal with the stochastic dynamic facility layout problem. In this problem, product demands are assumed to be dependent normally distributed random variables with known probability density function and covariance that change from period to period at random. To solve the proposed model, a novel hybrid intelligent algo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010